First run the following steps only if you are running Datalab from your local desktop or laptop (not running Datalab from a GCE VM):
If you run Datalab from a GCE VM, then make sure the project of the GCE VM is enabled for Machine Learning API and Dataflow API.
In [1]:
bucket = 'gs://' + datalab_project_id() + '-coast'
In [2]:
!gsutil mb $bucket
All data is under gs://cloud-datalab/sampledata/coast. See https://storage.googleapis.com/tamucc_coastline/GooglePermissionForImages_20170119.pdf for details.
Load the data from CSV files to Bigquery table.
In [3]:
import google.datalab.bigquery as bq
# Create the dataset
bq.Dataset('coast').create()
schema = [
{'name':'image_url', 'type': 'STRING'},
{'name':'label', 'type': 'STRING'},
]
# Create the table
train_table = bq.Table('coast.train').create(schema=schema, overwrite=True)
train_table.load('gs://cloud-datalab/sampledata/coast/train.csv', mode='overwrite', source_format='csv')
eval_table = bq.Table('coast.eval').create(schema=schema, overwrite=True)
eval_table.load('gs://cloud-datalab/sampledata/coast/eval.csv', mode='overwrite', source_format='csv')
Out[3]:
See the following file for the label description:
In [4]:
!gsutil cat gs://cloud-datalab/sampledata/coast/dict_explanation.csv
In [5]:
%%bq query --name coast_train
SELECT image_url, label FROM coast.train
In [6]:
coast_train.execute().result()
Out[6]:
In [7]:
from google.datalab.ml import *
ds_train = BigQueryDataSet(table='coast.train')
ds_eval = BigQueryDataSet(table='coast.eval')
df_train = ds_train.sample(1000)
df_eval = ds_eval.sample(1000)
In [8]:
df_train.label.value_counts().plot(kind='bar');
In [9]:
df_eval.label.value_counts().plot(kind='bar');
In [ ]: